The scenario for this tutorial is that, you have a series of a variable, such as the population density of different cities. And, you need to classify them into different groups according to this variable, e.g. the very high, medium high, medium, medium low, very low population density, etc.
In some cases, you already have a GeoDataFrame/DataFrame, in other cases, you just have a list that contain the numbers. So, the following cover two major functions:
The two functions takes a break_method for the breaking methods, such as quantile(default), head_tail_break, natural_break, equal_interval (and manual).
They take a break_N parameter, for specifying the number of groups.
And they also take a break_cuts.
First, import things that is needed.
In [53]:
import geopandas as gpd # for reading and manupulating shapefile
import matplotlib.pyplot as plt # for making figure
import seaborn as sns # for making distplot
from colouringmap import theme_mapping as tm # a function named leveling_vector in tm will be used
from colouringmap import breaking_levels as bk # a function named get_levels in bk will be used
# magic line for matlotlib figure to be shown inline in jupyter cell
%matplotlib inline
read a demo file, and take a look
In [3]:
grid_res = gpd.read_file('data/community_results.shp')
grid_res.head()
Out[3]:
take a look at the data distribution. using seaborn distplot.
In [7]:
sns.distplot(grid_res['usercount'], kde=False)
Out[7]:
the above plot showed that the data is potentially an exponential distribution. so lets try to make the yscale log.
In [58]:
ax = sns.distplot(grid_res['usercount'], kde=False)
#ax.set_xscale("log", nonposx='clip')
ax.set_yscale("log", nonposy='clip')
the following is the most simple way of converting the column of a gdf to levels
In [29]:
level_list, cuts = tm.leveling_vector(grid_res, 'usercount') #, break_method='quantile') #default method is quantile
Normally, the level_list is used to be assign to the gdf. This is what I did in other functions of mapping.
In [18]:
grid_res['user_level'] = level_list
In [24]:
grid_res.head()
Out[24]:
cuts contain the breaking values, and the min/max at the both end of the list.
In [30]:
cuts
Out[30]:
In [31]:
ax = sns.distplot(grid_res['usercount'], kde=False)
#ax.set_xscale("log", nonposx='clip')
ax.set_yscale("log", nonposy='clip')
for c in cuts:
ax.axvline(x=c)
In [32]:
lev = list(set(level_list))
count = [ level_list.count(l) for l in lev ]
print lev
print count
quantile has a similar count for each level.
Lets try some other break method.
In [33]:
level_list, cuts = tm.leveling_vector(grid_res, 'usercount', break_method='head_tail_break')
print cuts
In [34]:
ax = sns.distplot(grid_res['usercount'], kde=False)
#ax.set_xscale("log", nonposx='clip')
ax.set_yscale("log", nonposy='clip')
for c in cuts:
ax.axvline(x=c)
In [35]:
lev = list(set(level_list))
count = [ level_list.count(l) for l in lev ]
print lev
print count
In [36]:
level_list, cuts = tm.leveling_vector(grid_res, 'usercount', break_method='natural_break')
print cuts
In [37]:
ax = sns.distplot(grid_res['usercount'], kde=False)
#ax.set_xscale("log", nonposx='clip')
ax.set_yscale("log", nonposy='clip')
for c in cuts:
ax.axvline(x=c)
In [38]:
lev = list(set(level_list))
count = [ level_list.count(l) for l in lev ]
print lev
print count
In [39]:
level_list, cuts = tm.leveling_vector(grid_res, 'usercount', break_method='equal_interval')
print cuts
In [40]:
ax = sns.distplot(grid_res['usercount'], kde=False)
#ax.set_xscale("log", nonposx='clip')
ax.set_yscale("log", nonposy='clip')
for c in cuts:
ax.axvline(x=c)
In [41]:
lev = list(set(level_list))
count = [ level_list.count(l) for l in lev ]
print lev
print count
The number of level is set to the parameter break_N, which is default to 5.
After setting the break_N to N, the number of cuts become N+1, because it contain both the largest and the smallest values.
In [43]:
level_list, cuts = tm.leveling_vector(grid_res, 'usercount', break_method='head_tail_break', break_N=3)
print cuts
In [44]:
level_list, cuts = tm.leveling_vector(grid_res, 'usercount', break_method='head_tail_break', break_N=5)
print cuts
In [45]:
level_list, cuts = tm.leveling_vector(grid_res, 'usercount', break_method='head_tail_break', break_N=7)
print cuts
In [46]:
level_list, cuts = tm.leveling_vector(grid_res, 'usercount', break_method='head_tail_break', break_N=9)
print cuts
note that what head_tail_break do for increased number of levels.
There are two ways of using the cuts. This will return a cut list, and a level_list that is in the same length and same sequence with the input vector.
NOTE that the cut list has to include the minimum and maximum values.
In [55]:
level_list, cuts = tm.leveling_vector(grid_res, 'usercount', break_method='quantile', break_cuts=[0.,.25,.5,.75,1.])
print cuts
In [56]:
level_list, cuts = tm.leveling_vector(grid_res, 'usercount', break_method='quantile', break_cuts=[0.,0.1,.5,.99,1.])
print cuts
In [57]:
level_list, cuts = tm.leveling_vector(grid_res, 'usercount', break_method='manual', break_cuts=[0.0, 120, 490, 1200, 2200, 4506.0])
print cuts
Let say you have a list, instead of a dataframe/geodataframe.
In [47]:
a_list = grid_res['usercount'].tolist()
And you want to get the break levels, another function is also provided (the function that is called by tm.leveling_vector).
In [49]:
level_list, cuts = bk.get_levels(a_list, method='head_tail_break', N=5)
In [50]:
print cuts
In [52]:
len(level_list)==len(a_list)
Out[52]:
The resulting level_list is in the same sequence as the input a_list.
In [ ]: